Ref 

# 


Hits 


Search Query 


DBs 


Default 
Operator 


Plurals 


Time Stamp 


LI 


18 


"704"/$.ccls. and (speech near3 
recogni$) and (((probability near 
densitv^ or odf} same fsDeech near 
model$l)) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 17:56 


SI 


1 


09/866596 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 11:10 


S2 


60 


(speech near3 recogni$) and (MA or 
(moving near averag$)) and (AR or 
(auto$regress$ or (auto near 
regress$))) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 11:23 


S3 


26 


"7047$.ccls. and (speech near3 
recogni$) and (MA or (moving near 
averag$)) and (AR or (auto$regress$ 
or (auto near regress$))) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 11:23 


S4 


7 


(speech near3 recogni$) and (MA or 
(moving near averag$)) and (AR or 
(auto$regress$ or (auto near 
regress$))) and @ad< "20000620" and 
((probability near density) or pdf) 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 11:21 


S5 


64 


"704"/$ .eels, and (speech near3 
process$3) and (MA or (moving near 
averag$)) and (AR or (auto$regress$ 
or (auto near regress$))) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 12:48 


S6 


80 


"704"/$.ccls. and (speech near3 
process$3) and (((MA or (moving near 
averag$)) and (AR or (auto$regress$ 
or (auto near regress$)))) or arma) 
and @ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 12:49 


S7 


3 


"704"/$.ccls. and (speech near3 
process$3) and (((MA or (moving near 
averag$)) and (AR or (auto$regress$ 
or (auto near regress$)))) or arma) 
and (pdf or (probability near density)) 
and @ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 12:51 


S8 


4 


"704"/$.ccls. and (speech near3 
(process$3 or recogni$)) and (((MA or 
(moving near averag$)) and (AR or 
(auto$regress$ or (auto near 
regress$)))) or arma) and (pdf or 
(probability near density)) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:06 


S9 


869 


"704"/$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:07 
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SIO 


151 


"704"/$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and ((noise near3 
(suppress$ or reduc$)) or (speech 
near3 enhanc$)) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:11 


Sll 


3 


"704"/$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and ((noise near3 
(suppress$ or reduc$)) or (speech 
near3 enhanc$)) and (auto$regress or 
ar) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:08 


S12 


12 


"704"/$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and ((noise near3 
(suppress$ or reduc$)) or (speech 
near3 enhanc$)) and (auto$regress$ 
or ar) and @ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:08 


S13 


23 


"704"/$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and ((noise near3 
(suppress$ or reduc$)) or (speech 
near3 enhanc$)) and ((probability 
near density) or pdf) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 17:55 
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Ref 

# 


Hits 


Search Ouerv 


DBs 


Default 
Operator 


Plurals 


Time Stamp 


SI 


1 


09/866596 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 11:10 


S2 


60 


(speech near3 recogni$) and (MA or 
(moving near averag$)) and (AR or 
(auto$regress$ or (auto near 
regress$))) and @ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 11:23 


S3 


26 


"7047$.ccls. and (speech near3 
recogni$) and (MA or (moving near 
averag$)) and (AR or (auto$regress$ 
or (auto near regress$))) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 11:23 


S4 


7 


(speech near3 recogni$) and (MA or 
(moving near averag$)) and (AR or 
(auto$regress$ or (auto near 
regress$))) and @ad< "20000620" and 
((probability near density) or pdf) 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 11:21 


S5 


64 


"7047$.ccls. and (speech near3 
process$3) and (MA or (moving near 
averag$)) and (AR or (auto$regress$ 
or (auto near regress$))) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 12:48 


S6 


80 


"7047$.ccls. and (speech near3 
process$3) and (((MA or (moving near 
averag$)) and (AR or (auto$regress$ 
or (auto near regress$)))) or arma) 
and @ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 12:49 


S7 


3 


"7047$.ccls. and (speech near3 
process$3) and (((MA or (moving near 
averag$)) and (AR or (auto$regress$ 
or (auto near regress$)))) or arma) 
and (pdf or (probability near density)) 
and @ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 12:51 


S8 


4 


"704"/$.ccls. and (speech near3 
(process$3 or recogni$)) and (((MA or 
(moving near averag$)) and (AR or 
(auto$regress$ or (auto near 
regress$)))) or arma) and (pdf or 
(probability near density)) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:06 


S9 


869 


"704"/$.ccIs. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:07 


S10 


151 


"7047$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and ((noise near3 
(suppress$ or reduc$)) or (speech 
near3 enhanc$)) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:11 
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Sll 


3 


"704"/$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and ((noise near3 
(suppress$ or reduc$)) or (speech 
near3 enhanc$)) and (auto$regress or 
ar) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:08 


S12 


12 


"7047$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and ((noise near3 
(suppress$ or reduc$)) or (speech 
near3 enhanc$)) and (auto$regress$ 
or ar) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 14:08 


S13 


23 


"704"/$.ccls. and (speech near3 
recogni$) and (speaker near3 
(recognition or verification or 
identification)) and ((noise near3 
(suppress$ or reduc$)) or (speech 
near3 enhanc$)) and ((probability 
near density) or pdf) and 
@ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/06 17:55 


S14 


18 


"704 /$.ccls. and (speech near3 
recogni$) and (((probability near 
density) or pdf) same (speech near 
model$l)) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/07 11:16 


S15 


0 


"704"/$.ccls. and (speech near3 
recogni$) and (variance with (signal 
near quality)) and @ad< "20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/07 11:17 


S16 


2 


"704"/$.ccls. and (speech) and 
(variance with (signal near quality)) 
and @ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/07 11:17 


S17 


3 


"7047$.ccls. and (speech) and 
(variance with ((signal or speech) near 
quality)) and @ad<"20000620" 


US-PGPUB; 
USPAT 


OR 


OFF 


2005/01/07 11:17 
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Bayesian approach to parameter esfnation and 
interpolation of time-varying autoregressive 
processes using the Gibbs sampler 



JJ.Rajan * 2~ ^ 

P.J.W.Rayner 0 '^BL£. CDDV 

SJ.Godsill ^v-ff-y 



Indexing terms: Parameter estimation. Autoregressive process. Gibbs sampler 



Abstract: A nonstationary time series is one in 
which the statistics of the process are a function 
of time; this time dependenc y makes it impossible 
to utilise standard analy ticaii y^aei med^ statistical 
estimators to *~ pafameterise the process ? To 

overcome this difficulty, the time series is 

considered within ~a ~Tinite time interval and is 
modelled as a ji me- vary in g autoregressive (A R) 
process. The AR_cjreEficjejits _that characterise th is 
process ar e functions of time, represen ted by a 
family of basis vectors. The corresponding basis 
coefficients are invariant over the time window 
and have stationary statistical properties. A 
method is described for applying a Markov Chain 
Monte Carlo method known as the' Gibbs 
sampler to the problem of estimating the 
parameters of such a time-varying autoregressive 
(TVAR) model, whose time dependent 
coefficients are modelled by basis functions. The 
Gibbs sampling scheme is then extended to 
include a stage which may be used for 
interpolation. Results on synthetic and real audio 
signals show that the model is flexible, and that a 
Gibbs sampling framework is a reasonable 
scheme for estimating and characterising a time- 
varying AR process. 



1 Introduction 

In most system identification and estimation techniques 
it is necessary to assume that the signal is stationary. 
This requires that the underlying statistics and the 
model parameters that characterise the process are not 
dependent on time. However, this assumption is often 
incorrect for many physical signals encountered in 
speech processing, EEG analysis and seismology. No 
general mathematical framework for dealing with non- 
stationary signals exists, and in practice, the problem of 
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time dependency is circumvented by assuming that the 
process is locally stationary over a relatively short time 
interval, but globally nonstationary. The assumption 
that the process does not depart 'too far* from station- 
ary over a finite time window then allows the applica- 
tion of standard stationary time series analysis 
techniques to the data over each finite time window. 
This procedure is acceptable when the time variation of 
the process is slow, and the number of observations in 
each finite window is sufficient to allow a reasonable 
estimate of the desired model parameters. However, 
from a modelling point of view this approach is clearly 
sub-optimal, in that any change in the parameter val- 
ues can only occur at window boundaries. 

In this paper, a Bayesian framework for modelling 
nonstationary processes using time-varying autoregres- 
sive models is described. This framework is quite gen- 
eral as it allows any form of time variation in the 
parameters to be represented as basis functions. The 
parameters of the TVAR process are estimated using 
an efficient Gibbs sampling scheme. This sampling 
scheme is extended to include an interpolation stage, 
which samples the interpolant from the overall nonsta- 
tionary model of the process. 

2 Time varying AR models 

The modelling of nonstationary signals by autoregres- 
sive (AR) m odels with time-varving c oefficients is a, 
well stud ied_ problem [1=51. Th e signal values are pm- 
jected on to a basis of time function s, thus allowing the 
global signal to be represented by a set of constan t 
p arameters (i.e. the basis coefficients). In effect, this Ts 
just a transfo rmation of the nonstationary signal into a 
different space, where it can be viewed as a linear time- 
invariant process. The goodness of fit achieved using 
this technique is heavily dependent on the subspace 
spanned by the chosen time functions. The schemes 
proposed by Grenier [1], Subba Rao [3] and Li po race 
[6] all assume that the pole movements are relatively 
slow-varying, and can be modelled using a fixed basis 
of time functions. Standard choices for these time func- 
tions include Legendre polynomials, Fourier basis func- 
tions, discrete prolate spheroidal sequences (DPSS) and 
B-splines. The framework is completely general, as it 
allows any family of time functions to be used, taking 
advantage of any available a priori information about 
the signal such as 'seasonal* effects or discontinuities. 
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2. 7 Analysis of time-varying AR models 
A stationary AR process y of order p is of the form 

y{n) = -Ciy(n-l)-c 2 (n-2)-...-Cpy(n-p)+e(n) 

(1) 

such that the current observation at time / = / is 
dependent on the weighted sum of the previous p 
observations (i.e. the observations at / = / - 1, t = / - 2, 
/ = /-/>) plus some stationary excitation noise proc- 
ess e. The weights c applied to the previous observa- 
tions are known as AR coefficients; if these coefficients 
are constant and the poles are inside the unit circle, the 
resulting time series is stationary since the statistics of 
the process are not dependent on time. 

If the AR coefficients are functions that are allowed 
to vary with time, the general model is considerably 
more flexible, as it can be used to represent part icula r, 
types of process where the statistics of the t ime 'series 
are dependent on time. Let x "ble^^^eryarying AR 
process of order p: 

x(n) - -/i(n-l)x(n-l)-/ 2 (n-2)x(n-2)- ... 

- f P {n - p)x(n - p) + e(n) (2) 

where yj is the function that represents the /th AR time- 
varying coefficient, and e[n) is a ^rpj^mean,., stationary 
Gaussian white noise process. 

If the functions f, can be represented by a family of 
m basis vectors {u/.j = 1, m} t the framework used in 
[1, 2) is repeated, and the process can be written as 

rn 

x(n) = — a\jUj{n — 1e)x(n — 1) — ... 

m 

~ ]C a ™ u i ( n ~ P) x ( n - P) + e ( n ) ( 3 ) 

where are time-invariant basis coefficients. The vec- 
tors Uj are assumed to span some vector space that con- 
tains the information needed to represent the functions 
fi to some arbitrary tolerance. Without loss of general- 
ity, assume that the process {x(n):n ~ 1, N + p] is 
observed. Expressing eqn. 3 in matrix notation 
x = -XiUai — ... — XpUa p -t- e 

— -Yiai - ... - Y p a p + e 

= -Ya + e (4) 
where x is a vector of samples {x(i): i = p + 1, N + 
p) y X, is an N diagonal matrix with signal values from 
x(f) to x(N + / - 1) along the main diagonal, and U is a 
N x tn matrix whose columns are the vectors uj. 

If_it assumed that the error residuals a re Gauss ian 
and white, the Jaco bian of tfie~tr^sfbTmaTion betw een, 
the erroTresiduals^ and_th e data (i.e. e — » x) is unity , 
and the likelihood function-is-of the standard form 



Xxla^.-.ap.a*) = (2ira 2 e ) * exp 



- (5) 

Within a Bayesian framework, prior distributions are 
assigned t o the TVAR parameters a, and a } giving an 
expressio^forj he joint distr ibution /;(x, a, T^a p , o e 2 ). 
However, ob taining parameter esti mates from this joint 
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distributiorWs generally nQ4Vlgyial__a nd analytical 
closedjic^irjy^ ForltHis^ reason, 

inrnecessary to resort to^numericaL techniques 7o~per- 
form the parameter,estimation. 

3 Gibbs sampler 

The Gibbs sampler is a Markov chain Monte Carlo 
(MCMC) sche me which allows randoriTVariates to b e 
sampled frojm^ jojnt^r^ p((o f 
_g. oL It is one of t he simplest and^mosl flexible sam- 
pling techniques available for tfris task. The~M _CMC 
sampling schemes oripinateo* in stati stjcaljshvsics and 
generally requ ire si g nificant computational expen se. 
With the increasing availabilit y of_cpmpu ting power 
they are beginning to be appliedTo general problems in 
engineering and signal processing [7-10]. 

The basic idea behind the Gi bbs s am pler is thatJth e 
problem of obtaining samples from a jomTPPF c an be 
broken down to_drawing su ccessive samples from a set 
of PDFs qjjg maller dimens ionality. The application to 
pa ramete r^stlma tj pjijirpj ^ino^ is. a 

P D F ofja rgcdim ension for which the parame ter values 
a re required which maximisejhat PDF, then the Gibb s 
sampler (by dra win g samples from the PDF) enables 
the formation"^r Eistog rams of the parametersTTf onT 
which reasonable point estimates can be inferred. 
To sample varia tes^rbm . the joint PUh p(a> t ft a) 

using th e above scheme: 

p —«=-—, , 

(i) Assume a random starting position (afl, #\ cfi) 

(ii) Draw a sam ple a) 1 from the conditional JDF 
p(a)|eP, cP) ~ 



(iii) Replace juP by co l and continu e the proces s by 
drawing &\ from p(&{cP 9 a> x ). As soon as a ra ndom vari- 
ate is drawn i t is immediately substitu ted back into the 
conditiojiaf PDF 

(iv) Continue this process of success i ve sampling from 
the conditional P DFs for 7V iterations o f the parame- 
ters 




As with other MCMC , methods, this_sampUng~scheme 
require s an initial transient period known as 'burn-in', 
wher e the sampling scheme converge s. Th is should be 
d i scarded. Th e length of the burn-in period depends on 
the dimensi onality and a pojref/'of /correlations betw een 
the differel ^paramete rsjl 1]. Samplesjdra w" after th is 
stage can _be con side r ed as having been drawn from the 
joint PDF J jaJr^j a). 

The Gibbs sampler satis fies the conditio ns of detailed 
balance„[Note I] ancTiFcan be shown that the jo int den- 
sity is an invariant distribution of the Markov Chain. 
More detailed information on the Gibbs sampler and 
its convergence properties can be found in [10, 12-15]. 

4 Prior distributions on the AR coefficients 

To form the joint PDF of aj ljheJx e£43axajn^ters in t he 
rn5ael"s pecTfied by eqn/ j, prior attributions' must be 
assignecj jto the ^actual TVAR coefficients. T hese prioF 
distribution s iqaySe non- informative or may incorp o- 
rate relevant infoTmaTiort regarding jJre~~physical pro c- 
essT~fcach ot these imposes diiierent~constraints on The 
flexibility "51 the mode l. T he first is a non-informativ e 
^Gaus sian prior with a com mon variance. The second is 
~a~~Gaussian smoothing prior which imposes the co n - 

Note I: This implies that un invariant distribution exists and is identical 
to the joint PDF. 
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gtraint that the kth order differences of the TVAR 
coefficients are minimised, and has the effect of 
smoothing the resulting representations of the AR coef- 
ficients. 

4. 7 Gaussian prior 

If no prior knowledge regarding the model parameters 
is available, it is possible to consider the basis coeffi- 
cients as independent normally distributed variables 
with unknown variance in the prior 



'2a 2 



(6) 



p(a|a 2 ) = niI( 27ra2 )* iex P 
t=i j=i 

where m is the number of basis vectors required to reg? 
resent each of the /? AR time-varying coefficients. 

A standard pri ox for application to scale parame ters 
(such as v ariances^ is the inverse Gamm a J JG) density , 
of whicrTth e non-informative Jeffreys' prior [Note 2] is 
a special case [16, 17]. The inverse Gamma prior is of 
the form 

( a 2)-(a+l) 



exp 



[ * 2 0. 



(7) 



0°T{a) 

The parameters a and P can be chosen to make the. 
inverse Gamma prior on c^^ifTuse and „pre vent the 
term from collapsing to zero. (This is similar to impos- 
ing a constraint tfiat excludes the trivial solution.) This 
is often necessary for hyperpa^rameters^\\^ 
although for well conditioned terms such as the err ox 
"r esid ual v^mv^^^^Txion ^T6m?^.\\^ Jeffreys' pri or 
wttl normally ~"~5e satisfactory^ Hence, an mverse 
Gamma prior is assigned _to o 2 and a Jeffreys' priorj o 
oJ\ If aj^ [a t r , -tr/ag 7 ] 7 *, the joint posterior probabili ty 
w 3ensity_^n-be-expres5fcr as 

p(x,a,^,a 2 ) = p(x|a ) a e )p(a*)p(a|a)p(<7 2 ) (8) 

In the Appendix (Section 9.1), the appropriate condi- 
tional PDFs of the parameters are obtained from this 
expression for the joint PDF to facilitate application of 
the Gibbs sampler. 

4.2 Gaussian smoothing prior 
_ln c e rtain a pplications, one may pos sess some pri or 
knowledge as Tfrrtrrr^fiml natnreTir the nonstationa r- 
ity in terms of t he variation of the statistics rel ating to 
the process/ This informatio n is usually expres sed in 
fairly vague terms such as 'smoothness'^ of the time 
dependent coefficients d ue to known physical con- 
straints on the underjyln^process. For exam ple, if it j s 
known that there exist s a physical constraint on the 
variation of the AR coefficients, such that their time 
dependence is 'smooth' in the sense that the second- 
order differences are small (i.e. approximately maxi- 
mally flat to degree 2), then this information can be 
incorporated by placing constraints on the model 
parameters q v The (N - 2) x N matrix D is defined 
such that Dx = x(n) - 2x(n - 1 ) + x(n - 2) is the vector 
of second order differences. 

The smoothness constraint applied to the y'th basis 
vector, in the representation of the ith TVAR coeffi- 
cient, can be expressed in terms of the probability den- 
sity function of the basis coefficient, conditioned on 
some regularisation constant X fJ . Hence 

Note 2: The Jeffreys* prior is an inverse Gamma density in the limit (with 
appropriate seating factors) where a = 0 and ($ = ». Assigning the Jef- 
freys" prior to cr results in the distribution /Her) = l/cr\ The Jeffreys' 
prior is known as an improper prior as it is not a normalised PDF. 



p(a a -|A,-i, X irn ) 

= (2^ri|Rri|§exp[-iaTRr 1 a i 



(9) 



where Rf 1 = 2I, r U 7 t> r DUZ, and Z ( isamxm diagonal 
matrix with terms {V^y. / = 1, ™} along the main 
diagonal. 

The regularisation constants A,-,- describe the ratio of 
the goodness of fit to the degree of smoothing the con- 
straint applies to the reconstructed signal [19]. This 
explicitly measures the required degree of constraint, in 
terms of the actual smoothness provided by each basis 
vector included in the reconstructed signal. In certain 
applications, values for X t j may be known as functions 
of the data length. The use of the smoothness con- 
straint ensures that high frequency basis vectors do not 
have an excessive effect on each of the AR coefficient 
representations, yet at the same time provides a reason- 
able approximation to the actual time variations of the 
AR coefficient. 

The physical interpretation of the regularisers is as 
inverse variance terms which act as a set of independ- 
ent weights on each of the basis coefficients. A possible 
prior distribution to apply on the A ( y is the Gamma dis- 
tribution given by 



p(Xij\a 0i Po) = 



OfQ-1 



j8*°r(a 0 ) 



exp 



0o 



(10) 



The Oq and ft 0 parameters are chosen so that the 
Gamma PDF is diffuse and has a mean of one. Since 
the regularisers act as independent weighting functions 
on the basis coefficients, it is reasonable to assign prior 
PDFs on the regularisers that are centred around unity. 
This assumes that there is no prior information as to 
the degree of regularisation required by each basis coef- 
ficient. 

The joint probability density function of the data and 
the parameters can be written as 

p(x, a, A n , A Xmj A pm , a e ) = p(x|a, a e )p(cr 2 ) 

P P m 

x Y[p(Bi i \X il ,... : X im )l[Ylp(X ij \ ao ,p 0 ) (11) 

1=1 t=l j=l 

where the diffuse Gamma prior on the regularisers X fj - 
prevents the terms from going to zero. In a similar 
manner to that shown in the Appendix (Section 9.1), 
the appropriate conditional PDFs of the parameters 
can be obtained from the joint PDF using the Gibbs 
sampler. 

5 Interpolation of missing data 

The problem of interpolating missing samples of a 
time- varying autoregressive process is considered, with 
the data samples {x(i): 1 s i & m) and {x(i): m + r + I 
s i s N} as observed data, and {x(t): m + 1 s / s /n + 
r} as missing data. The data vector x is partitioned so 
that y represents the observed data and z is the missing 
data. The problem of interpolation can then be posed 
as follows: draw a sample from the conditional PDF 
p(z|y) so as to obtain a representative sample of the 
missing data, conditioned on the observed data. 

The application of the Gibbs sampler to this task 
allows an interpolant for the missing section of data to 
be obtained, that is typical of the time-varying AR 
process, and not one that simply best fits the missing 
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section. This is desirable in many practical situations 
such as audio restoration, where it is evident that inter- 
polation based on a maximum likelihood (ML) or 
expectation maximisation (EM) restoration of missing 
samples will often result in an interpolant which is 
atypical of the underlying process [9, 16]. This reason- 
ing corresponds with the statistical physics interpreta- 
tion of the estimation of the missing samples. Both 
maximum likelihood estimation and Gibbs sampling 
draw variates from the Boltzmann distribution {exp(- 
(l/7)log p(x\O t o 2 )]} but at different temperatures T. 
Maximum likelihood corresponds to estimation at T = 
0, while Gibbs sampling draws variates at a tempera- 
ture of unity, the ambient temperature [10, 12]. Sam- 
pling from the predictive density /?(z|y) therefore results 
in a typical interpolation, since the interpolant and 
observed data are at the same temperature 7* = 1. 

Some expressions that detail the implementation of 
the method are derived. By rewriting eqn. 4 the error 
residuals can be expressed in terms of y and z: 

e = A„y + A z z (12) 

where A y and A z are lower triangular matrices contain- 
ing the actual TVAR coefficients f,. The Jacobian of 
the transformation of variables from the error residuals 
to the data (e {y, z}) is unity and the likelihood 
function is given by 



p(z l y|ff?,fi,...,fp) = (27rol)->exp 



1 

2ol 



x (z r A 2 z + 2z T AfA^y + y r A^ A„y)J 

(13) 

By representing the TVAR coefficients using a set of 
basis functions, the joint PDF of the data (both 
observed and missing) and the associated parameters 
can be formed as in eqns. 8-1 1 . However, for the pur- 
poses of interpolation only the prft^jctivf, rirn cit y /»f»|y> 
-' s jgoujred-" ^ ne otner Par ameters such as th e AR cnef- 
ficients and the error residual varia nts are unimpor- 
tant [12TT£ 16], Hence, after the Gibbs sampler has 
converged, MAP (maximum a posteriori) estimates for 
the other parameters should be substituted back into 
the expression for the joint PDF to give a first order 
approximate expression for the predictive density 



p(z|y) ^p(z\y % 0 MA p^MAp) 



(14) 



This can then be used to obtain an approximate sample 
for the missing data z. In the Appendix (Section 9.2) 
the conditional PDF of the missing data z is derived 
from the joint PDF given in eqn. 13. 

6 Results 

5. 1 Gaussian prior 

A synthetic example describing the application of the 
Gibbs sampler to the problem of estimating TVAR 
coefficients is presented. Choosing a relatively simple 
problem for which the correct set of basis functions is 
known beforehand allows one to verify that the Gibbs 
Sampler does indeed converge, and draws samples 
from the correct conditional probability distributions. 
It should be noted that even for this simple problem, a 
closed form analytical solution does not exist [1,2, 20]. 
The Gibbs sampler is applied to a data set consisting of 
2000 samples of a synthetically generated time-varying 
autoregressive process of order 5. The functional forms 



plea ui 



(/ is sampleTT uniformly on the interval [0, 2*j) of the 
TVAR coefficients are given by/i(0 = 0.6 sin(2/), f 2 (t) 
= -0.4 cos(/),/ 3 (/) = -0.3 cos(20,/4(0 = 0.6 sin(r), and 
AO) = -0.7 cos(/) respectively. A Fourier basis set con- 
sisting of the five functions U/ = [1, sin(/), cos(/). 
sin(2/), cos(2/)] 7 " was used. Fig. I shows an example of 
the true TVAR coefficient and its corresponding esti- 
mate determined using the Gaussian prior on the coef- 
ficients. 




2500 



Fig.1 4th TVAR coefficient (solid line) and Gibbs sampler estimate 
(dashed line) 




045 03 0.95 

Fi 9 • 2 H istosntm of o e ~ 
True value = 1 .00 
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Fig. 2 shows the histogram of the error residual vari- 
ance (the true value is 1.00; the estimated value is 
1.004, and the estimated standard deviation is 0.03; the 
estimates from the histogram are made using a 
weighted average of the samples in each bin). These 
results show that the estimates are in good agreement 
with the true TVAR coefficients /</). In this example, 
2500 iterations of the Gibbs sampler were performed. 
Conservatively, the first 1000 were discarded as 'burn- 
in*. It should be noted that in practice, the parameter 
values appeared to converge very quickly, usually 
within 1 5 to 20 iterations. 

6.2 Gaussian smoothing prior 
Applying the Gaussian smoothing prior to the data in 
Section 6.1 gave almost identical results. This is to be 
expected, since the smoothness of the TVAR coeffi- 
cients is primarily dictated by the smoothness of the 
basis functions used. In this case, the five Fourier basis 
functions were of low frequency, so the effect of the 
smoothing constraint was not appreciable. 
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' To demonstrate the Gaussian "smoothing prior, the 
TVAR coefficients were estimated for the same data, 
but using a different basis set U A , consisting of the five 
Fourier basis vectors plus another five (arbitrary) high 
frequency basis vectors. Figs. 3 and 4 show the repre- 
sentation of the fourth TVAR coefficient using the 
Gaussian prior and Gaussian smoothing prior respec- 
tively. It is clear from these figures that the Gaussian 
smoothing prior representation is less 'noisy 1 , in that it 
attempts to suppress the effect of the high frequency 
basis functions. In practice, this will be of little impor- 
tance if the set of basis vectors chosen contains only 
low frequency basis functions. 

0.8, ■ ■ < 1 




Fig. 4 4 th TVAR coefficient (dashed line) and estimate (solid line) 
using basis set iff, and Gaussian smoothing prior 

6.3 Interpolation 

6.3. 7 Synthetic AR{5) data: The problem of inter- 
polation is considered with regard to the time-varying 
AR(5) process described in Section 6.1: 200 samples 
between /i — 801 and n = 1000 inclusive are removed 
and considered 'missing*. The Gibbs sampler is then 
applied to the data, performing a full parameter esti- 
mation and an interpolation of the missing data. After 
5000 iterations (2000 are discarded as , burn-in , ) f MAP 
estimates of the parameters are made (i.e. the values of 
the basis coefficients a and the error residual variance 
cr e 2 ). An approximate predictive density p(z\y) is 
obtained by substituting these parameter values back 
into the conditional PDF for the interpolant, given by 
eqn. 28. A sample from the predictive density is chosen 
as a typical representation of the missing data. 



Fig. 5 shows interpolant obtained using the 
Gibbs sampler method described above. The estimate 
of the residual error variance o e 2 is 0.9949, and there is 
no discernible difference between the representations of 
the TVAR coefficients and the Gibbs sampler estimates 
obtained without interpolation. 




1300 
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600 700 800 900 1000 1100 1200 1300 
sample number 
Fig. 6 Synthetic data: ML restoration 

For comparison, an ML (maximum likelihood) esti- 
mate [21, 22] of the interpolant is also determined as 
shown in Fig. 6. In this case, a e 2 is estimated as 0.8818 
and the representations of the TVAR coefficients are 
inferior to those obtained using the Gibbs sampler; a 
plot of the ML estimate of the 4th TVAR coefficient is 
given in Fig. 7 (cf. Fig. 1). 

0.8i ■ 1 ■ 1 
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Fi g . 7 4 th TV A R coefficient (solid line) and ML estimate ( dashed line) 
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Figs. 5 and 6 show tha^he Gibbs sampler interpo- 
lant is more typical of the underlying time-varying 
process than that obtained using the ML method. 

6.3.2 Chirp data: The benefits of using the Gibbs 
sampler for estimating the basis coefficients of a time- 
varying AR model have been illustrated. However, is 
there any advantage in using a time-varying AR model, 
as opposed to a stationary AR model? Consider a 
'noisy chirp 1 signal x = sin(j3/ 2 ) + v, where the chirp 
rate jS = jit, / is sampled uniformly on the interval [0, 
2;t\ and v is Gaussian white noise with a standard devi- 
ation of 0.1. The signal is clearly nonstationary. One 
hundred samples are removed, starting with sample n = 
700, and the Gibbs sampler is applied, assuming a sta- 
tionary AR model of order 30 in one case and a time- 
varying AR model (using five Fourier basis vectors to 
represent the TVAR coefficients) of order 30 in the 
other. The interpolants formed are shown in Figs. 8 
and 9 respectively, and the interpolant in Fig. 9 using 
the time-varying AR is clearly a better interpolation, 
owing to the improved modelling of the data. 

1.5. r— . , , , . 




sample number 

Fig. 8 Interpolation of diirp using a stationary AR model and Gibbs 
sampler 

true chirp signal 

Gibbs sampler inlerpolanis 

Interpolated area is within dotted vertical lines 
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Fig. 9 Interpolation of chirp using a time-varying AR model unit Gibbs 
sampler 

true chirp signal 

Gibbs sampler interpolants 

Interpolated area is within dotted vertical lines 

6.3.3 Audio data: Audio data are nonstationary but 
can be considered stationary over a relatively small 
time interval [16]. In general, the data are modelled 
using autoregressive models over the lime interval dur- 



ing whicnVr data can be considered stationary. As the 
time variation of the data is relatively slow, the use of a 
time- varying autoregressive model would be useful in 
order to model large sections of data and represent the 
underlying process more accurately. Fig. 10 shows 3000 
samples of real audio data (solo vocal music sampled 
at 44.1kHz) that are clearly nonstationary. 

3000i » : : ' ■ rn ■ 1 




sample number 

Fig. 10 Audio data 

Dotted vertical lines indicate interpolation sections 

A time-varying autoregressive model of order 25 is 
applied to the data shown in Fig. 10. A basis set con- 
sisting of the first eight Legendre polynomial basis 
functions is chosen to represent the TVAR coefficients. 
The choice of basis set is problem-dependent. Initially, 
a Fourier basis set was chosen but the convergence and 
variance properties of the basis coefficients were found 
to be inferior to Legendre polynomial basis functions. 
The choice of model order was made by plotting a 
curve of approximate mean squared residual error 
against model order, and choosing the model order 
where it appeared that the mean squared error levelled 
off. This model order selection stage could be formu- 
lated within a Bayesian procedure [20, 23]. 

3000 1 i ; ■ ■ ; ■ 1 




450 550 650 750 850 950 1050 
sample number 

Fig. 11 Gibbs sampler restoration (solid line) and true data (dashed 
line) 

N 0 = 600: L - 200 

Three sections, each of L samples, are removed from 
the data block, as shown by the dotted vertical lines in 
Fig. 10. Each missing block consists of L samples from 
n = Nq + 1 to n = No + L inclusive. A Gibbs sampler 
with a Gaussian smoothing prior is used to estimate 
the joint density of the three sections of missing data 
and the unknown mode) parameters. Figs. 11-13 show 
the restorations of the L = 200 blocks of missing data 
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'at N 0 = 600, N 0 = 1400 and /^^2400 respectively. 
The restorations appear to Tit* the missing section in 
each case, and are typical of the observed data in that 
local vicinity. This demonstrates that a time-varying 
AR model can successfully represent an entire nonsta- 
tionary data block as shown in Fig. 10. 

2000, . : ■ : ■ , 




-2000 

•25001 . 1 . . ! ■ . 

1250 1350 U50 1550 1650 1750 1850 
sample number 

Fig. 12 Gibbs sampler restoration (solid tine) and true data (dashed 
line) 

N 0 = 1400; L = 200 
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Fig* 13 Gibbs sampler restoration (solid line) and true data (dashed 
line) 

N 0 = 2400; L - 200 

7 Conclusion 

The problem of applying the Gibbs sampler to the esti- 
mation of parameters for time-varying AR processes 
has been considered. Using a Bayesian approach, the 
prior on the time-varying auto regressive coefficients 
can be assigned so as to incorporate constraints such as 
the degree of smoothness of the parametric variation. 
Two priors on the time-varying coefficients have been 
considered: a Gaussian prior and a Gaussian smooth- 
ing prior which constrains the coefficient variations to 
be smooth to the fcth degree. 

The Gibbs sampler has been applied to several exam- 
ples of synthetic and real data and the scheme is found 
to be an effective method for characterising these hier- 
archical models. In addition, a simple extension to the 
sampling scheme allows an interpolation stage to be 
included. The type of nonstationary data resulting from 
time-varying AR processes, thought to be common in 
many engineering applications such as speech and 
audio processing, cannot be modelled or characterised 
using standard analytical techniques. However, apply- 
ing the Gibbs sampling scheme to the problem allows 



histograms of eao^PT the free parameters to be deter- 
mined, from which reasonable point estimates can be 
inferred. 
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10 Appendix: Derivation of conditional 
probability density functions 

The basic model of the time-varying AR process is 
assumed 

x{n) = -/ l (n-l)xCa-l)-/ 2 (vi-2)x(n-2)- ... 

- / P (n - ~ P) + e(n) (15) 
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where e(ri) is stationary GlB&an white noise. Each of 
the TVAR coefficients f, is represented by a set of basis 
functions {uj:j - 1, .... m) or by a stochastic process r f . 
The residual error sequence e can be written as 

e = x + X 1 f 1 + ... + X p f p (16) 
In all cases, the Jacobian of the transformation 
between the error residuals and the data (i.e. e x) is 
unity and the likelihood function is in the standard 
Gaussian form 



p(x|fi,...,f p ,<7 2 ) = (27ra2)-#exp 



1 

2al 



x x T x + 2x T £ XA + tfXfXjf,- 

\ t=l i=l 3-1 j 

(17) 

The parametric form of the conditional PDFs for each 
of the parameters is of a similar form to the likelihood 
and prior distributions. This is a property of conjugate 
priors [17]. The conditional PDFs for the model 
parameters are Gaussian and those of all the remaining 
variances are inverse Gamma distributions, for which 
well known sampling methods exist [24]. 

70. 7 Gaussian prior 

The joint PDF of the data and the parameters is given 
by 

p(x,a,<7 2 ,a 2 ) = p(x|a, ^)p(^)p(a|a 2 )p(<7 2 |a 0) )9o) 

(18) 

= (27ra e 2 )-* 



x exp 



_ J_( x t x + 2x T Ya + a T Y T Ya) 



x (27ra 2 )-^ exp|-^a T a] 



( CT 2)-(a 0 +l) 

x _ , — : — exp 



/3°°(<*o) 



(19) 



10. 1. 1 Conditional density for the error resid- 
ual variance oJ: 



p(<r*|x,a) « (<xl)-% exp 
oc (^)-<* +1 > 



exp 



(20) 
(21) 



J_E] 
>?2. 

where E = x r x + 2x^8 + a^^a. The conditional 
PDF for o 2 is in the form of an inverse Gamma (IG) 
density with a = N/2 and P = 2/E. 

10. 1.2 Conditional density for the basis coeffi- 
cients a: 



p(a|x,<r 2 ,<r 2 ) ocexp 



[ 2** 



(2x T Ya + a r Y r Ya r ) 



x exp 



T 
a 1 a 



(22) 



oc exp 



[-- 

p 

JJexp 



*=i 



(23) 



Computatio^^y, it is quicker to sample the basis vec- 
tors a, separately, rather than jointly as a concatenated 
vector a. However, the basis coefficients a ? that repre- 
sent the ^th TVAR coefficients are dependent on the 
other basis coefficients a^ = {a,-: / = 1, p\ i * q). 
Thus the basis coefficients can be be sampled in blocks 
that represent each TVAR coefficient. 



p(ajx,a a 2 ,cr 2 ) oc exp 



-^|2x-Y,a, 



i*=l 



7 



+ 2 £ aJ-YjY iai +a^Y,a, 



oc exp 
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2 









(■•/«> 



2a 2 
\ 



a^a. 



(24) 



The conditional density of the basis coefficients a q is a 
multivariate Gaussian. Sample from a Gaussian distri- 
bution with mean u. = -A" J b and co variance matrix C 
= A" 1 where 



{ °1 " 2 



»=i 



(25) 
(26) 



10.1.3 Conditional density for the hyperpa- 
rameter o 2 : 

p(a 2 |a 0 , A) ) 



oc (<r 2 ) exp 



T 



2<j2 



2\-(a 0 + l) 



exp 



[ <r 2 0o 

ec(^,-C^ + Oexp[-^(^ + i)] (27) 

This conditional density is in the form of an inverse 
Gamma density. The hyperparameter o 2 should be 
sampled from IG(pm/2 + Oq, 2/y2 + (^a r a)). 

10.2 Interpolation of missing data 
Without assuming any specific prior on the TVAR 
coefficients, the joint PDF for the observed and miss- 
ing data conditioned on the parameters of TVAR coef- 
ficient model is given by 

P(z|yi ff2»fi.».,fp) 

This conditional PDF is multivariate Gaussian with 
mean jx and covariance matrix C given by 

li = — (Af A z ) _1 (A^A y y) (29) 
C = (A'fA,)- 



cx exp 



- ~{.z T X[ A z z + 2z T ArA y y) 



(28) 



(30) 
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